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This thesis analyzes data from the 1988 New Recruit 


Survey (NRS) sponsored by the United States Army Recruiting 
Command to study incentives that motivate new recruits to 
enlist in the United States Army. Our purpose is to use 
discriminant analysis and logistic regression to identify 
those incentives that have the greatest effect on enlistees in 
the prim recruiting market and to compare the results of 
these two methods. We believe that the incentives identified 
will differ between high quality and non-high quality 
individuals where a high quality individual is defined as one 
who has a high school diploma and scores in categories I 
through IIIA on the Armed Forces Qualification Test (AFQT). 
Demographic variables such as an individual's marital status 
and time spent in the labor force prior to enlisting in the 
Army were shown to influence enlistment incentives. Further, 
factor analysis of NRS responses identified four underlying 
factors which influenced recruits' enlistment motivations. 
However, these factors differed between racial groups and 
accurate models could only be developed for each racial group 


separately. 
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I. INTRODUCTION 

The purpose of this thesis is twofold. First, to identify those enlistment 
incentives that have the greatest impact on high quality enlistees in the prime 
recruiting market. High quality recruits are individuals who score in categories I 
through IIIA on the Armed Forces Qualification Test (AFQT). Prime market recruits 
are considered to be 17 to 21 year old, male, high school diploma graduates. Second, 
this thesis will compare the results of two techniques for conducting the categorical 
data analysis supporting objective one described above. The two techniques that will 
be used are discriminant analysis and logistic regression analysis. The results of the 
analysis in this report will assist the U.S. Army Recruiting Command in developing 
advertising and compensation packages that will appeal to the demonstrated concerns 
of high quality enlistees in the prime recruiting market. Further, by identifying the 
expectations of recent enlistees, programs to fulfill these expectations and improve 


retention may be identified. 


A. HIGH QUALITY AND PRIME MARKET 
As the technical nature of military weapons systems continues to increase, the 
Army will also continue to depend on higher quality soldiers to maintain its 
effectiveness. The Chief of Staff of the Army General Carl E. Vuono states that 
In many conceivable contingencies potential adversaries throughout the world 
will enjoy numerical and geographical advantages, particularly in the early phases 


of a conflict. Those advantages demand that we have a high-quality force that, 
in turn, depends on quality people [Ref. 1:p. 12]. 


Regardless of these considerations, there is probably little argument that the military 
should be staffed by high quality soldiers. However. high quality, in terms of the 
needs of the Army should be carefully defined. According to a recent Department of 
the Army document, 
The affect [sic] of quality soldiers, defined as high school graduates who score in 
the top half of the Armed Forces Qualification Test (AFQT) (CAT I - IIIA), on 
individual and unit job performance is significant. Research conducted in 1989 
has shown that excellent soldiers (CAT I-IIIA) performed 10 to 25 percent better 
than lower quality (CAT IV) soldiers in specific armor, infantry, artillery, and 
signal training tasks [Ref. 2:p. 18]. 
This indicates that there is good evidence in support of the definition of high quality 
stated above. Additionally, this study will restrict the high quality group to those 
recruits who graduated from high school with a diploma as opposed toa GED. Finally. 
the 17 through 21 year old entry age requirement is added to the high quality 
definition to identify the prime recruiting market. The concern is how to target these 


high quality, prime market individuals and provide incentives that will best attract 


them to join the Army. 


B. TARGETING THE PRIME MARKET 

Simply knowing what group of potential recruits the Army wants to attract is not 
enough. The Army must reach those potential recruits and convince them to join the 
Army. "Recruiting a quality force in the U.S. Army is predicated on adequate 
resources for advertising, incentive programs, and compensation..." [Ref. 2:p. 18). 
Estimated Army advertising expenditures for the 1989 fiscal vear are nearly $120 
million [Ref. 3:p. 49]. To assist the Army in making the most effective use of these 


dollars, or perhaps even reduced resources, is a major concern of this study. 








C. THE NEW RECRUIT SURVEY (NRS) 

The Army’s advertising agency, Young and Rubicam of New York City uses 
survey information from new recruits to determine how its advertising mission will be 
accomplished [Ref. 4:p. 19]. This thesis will use the 1988 edition of the same survey 
data which Young and Rubicam uses. The. 2 data come from the New Recruit Survey 
(NRS) which is sponsored by the United States Army Recruiting Command and 
prepared by the Data Recognition Corporation. The NRS is a "multi-year survey 
research endeavor...conducted to measure the enlistment motivations, attitudes, 
knowledge, and personal characteristics of new recruits at the time of their initial 
entry into the U.S. Army." The U.S. Army Research Institute (ARI) developed the 
NRS in 1982 under the direction of the Deputy Chief of Staff of the Army for 
Personnel. In 1984, the U.S. Army Recruiting Command (USAREC) assumed control 
of the NRS and until 1986 ARI maintained administration of the survey. After 1986, 
administration of the NRS was transferred to the Data Recognition Corporation and 
scheduled on a year-round basis [Ref. 5:p. ii]. Figure 1 shows the schedule for data 
collection for the data used in this thesis. These data provide survey responses from 
5.863 new recruits of the active Army. Determining the best method of analyzing 
these data to study the impact of enlistment incentives on new recruits is a primary 


concern of this thesis. 


NEW RECRUIT SURVEY 
DATA COLLECTIONS 


SURVEY FQ2 FQ3 


FORMS Fee| MAR] APA| MAY, JUN [ JUL | AUG} SEP] OcT| Nov| DEC] JAN | FEB | MAR APR| MAY| 


Trimester 1 
Trimester 2 


Trimester 3 


Source: [Ref. 5:p. 3] 





Figure 1 1988 New Recruit Survey Data Collections 


Il. LITERATURE REVIEW 


A. QUALITY, PERFORMANCE, AND ATTRITION 
1. AFQT Scores 
If there is any question that high quality soldiers (as defined in this thesis) 
perform better than low quality soldiers, despite the research supporting this 
statement, the military’s inadvertent experiment of the late 1970’s should provide a 
definitive answer. 
In 1980, the Department of Defense acknowledged that the aptitude battery used 
for determining enlistment eligibility between 1976 and 1980 had been 
"misnormed," which means that prospective recruits received higher scores than 
they would have received on a correctly calibrated test. As a result, many 
persons entered the services during the last half of the 1970’s who did not meet 
draft-era enlistment standards; and in fact would not have been eligible to enlist 
with corrected scores [Ref. 6:p. 2]. 
The result of this calibration error was that by 1980 nearly fifty percent of all Army 


recruits were mental category IV, the lowest allowable level (Ref. 7:p. 1]. This result 


is shown in Figure 2. 
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Figure 2 Trends in high- and low-aptitude Army recruits 


Results of the Army’s Skill Qualification Tests (SQT), which are hands-on 
performance tests developed in the late 1970's for most Army jobs, can be used to 
assess the impact of this increase in low mental category recruits (Ref. 6:p. 6]. Figure 
3 shows that "regardless of high school status, men in category IV (revised norms) are 
more likely to fail the minimum SQT standard than are persons in higher categories." 
(Ref. 7:p. 2]. The significance of these results was further amplified by 

Using two different types of on-the-job performance tests, and five different 
Army jobs, it has been shown that lower-aptitude recruits have significantly 
lower job-proficiency scores, and are significantly less likely to meet minimum 
proficiency standards than are higher-aptitude personnel. Therefore, the decline 
in ability standards in recent years has lowered Army manpower effectiveness by 
enlisting more personnel who are unable to meet minimum skill requirements 
[Ref. 6:p. 30). 


These studies clearly indicate the need for high quality soldiers for the Army to 


maintain an acceptable level of performance. 
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Figeuire 3° Aptitude and Performance for Army Infantryman 


2. High School Status 

In spite of the poor job performance observed in low mental category 
soldiers, there is not a strong relationship between AFQT scores and attrition for first 
term recruits. There is, however, "a substantial association between high school status 
and attrition, both during and after training..." [Ref. 7:p. 6]. Figure 4 shows that 70% 
of high school graduates who enlist in the Infantry complete their initial term 
compared with only a 48% completion rate for non-high school graduates. Any soldier 
who fails to complete his initial enlistment represents a substantial lost investment for 
the Army. Therefore, it is critical that the Army attract recruits with the greatest 


probability of completing their enlistment. According to the research cited these 


people would be high school graduates. 
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Source: [Ref. 7:p. 7] 





Figure 4 Army Infantrymen length of service and high school status 


B. INCENTIVES, ADVERTISEMENT, AND ACCESSIONS 

"Individuals choose to do something only if that choice makes them better off 
than other possible alternatives given their preferences and the information in their 
possession." [Ref. 8:p. 1]. This statement emphasizes the key to recruiting high quality 
soldiers and the principal issue of this thesis. To attract high quality recruits from the 
prime recruiting market, the Army must offer incentives that are important te these 
individuals. While this thesis will not specifically address advertising issues, potential 
recruits must receive information concerning enlistment incentives before the 
particular incentives will have any affect. Identifying those motivators that have 
attracted high quality recruits is critical in assisting the Army to develop incentives 
packages and advertising campaigns. 

As stated earlier, we will use data from the 1988 New Recruit Survey (NRS). 
These data reflect the thoughts and opinions of only those individuals who enlisted in 


the Army. It should be acknowledged, that to best identify the motivators that attract 








high quality recruits to join the Army, we would also like to have NRS data for those 


individuals who did not enlist in the Army. Unfortunately, data corresponding to the 
enlistment motivation questions used in this thesis are not currently available for 


individuals who have not enlisted in the Army, 











III. DATA BASE AND METHODOLOGY 


A. 1988 NEW RECRUIT SURVEY (NRS) 


1. Survey Characteristics 
The 1988 New Recruit Survey (NRS) was conducted in three trimesters as 
shown in Figure 1. The survey was administered at eight reception stations to a total 
of 5,863 U.S. Army active duty recruits as shown in Tables 1 and 2 below.! 


TABLE 1 1988 NRS STATION SCHEDULE 


Station 
Ft. Benning 10 APR 89 
Ft. Bliss 20 MAR 89 
Ft. Dix 05 DEC 88 30 JAN 89 


Ft. Leonard 18 JUL 88 14 NOV 88 15 MAY 89 
Wood 


Ft. McClellan 29 AUG 88 26 SEP 88 03 APR 89 
Ft. Sill 08 AUG 88 17 OCT 88 23 JAN 89 














'The complete survey alsu includes 2,242 Army National Guard and 1.626 Army 
Reserve recruits however this study is concerned only with active Army respondents. 
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TABLE 2 RESPONDENTS BY STATION 


Bly fa 


re 


Ft. Leonard 

Wood 

Ft. McClellan 
Ft. Sill 





Selected tabulaticns of general characteristics of the 1988 NRS respondents 

are provided in Appendix A. 
2. Enlistment Motivation Questions 

The 1988 NRS contains 24 questions that specifically address the 
respondent's motivation to enlist in the Army. These 24 questions can be separated 
into two distinct groups. 

The first group contains 22 questions that list a particular reason that could 
motivate a person to join the Army. The respondent is then asked to rate the 
importance of the stated reason for his decision to enlist. The possible responses are 


as follows: 


e The reason was not at all important 
e The reason was somewhat important 


e@ The reason was very important 














e | would not have enlisted except for this reason 


The final 2 questions that deal with enlistment motivation each list ten 
reasons that could motivate a person to join the Army. Each respondent is asked to 
choose the one reason from this list of ten that was his most important reason for 
enlisting. 


See Appendix B for a listing of these questions. 


B. METHODOLOGY 
This thesis will compare the results of discriminant analvsis and logistic 
regression in identifying incentives that attract prime market recruits. The NRS 
survey data used are in SAS format and all data analysis and all techniques discussed 
will be implemented using SAS. Version 5.18. 
1. Hypothesis 
We hypothesize that the incentives which motivate prime market recruits 
to join the Army are different for high quality and non-high quality individuals. Two 
specific statistical techniques will be applied to the 1988 NRS data in order to identify 
the incentives providing the greatest motivation to high quality recruits in the prime 
market: discriminant analysis and logistic regression. The results of these techniques 
will be compared in relation to this hypothesis. 
2. Discriminant Analysis 
Procedure DISCRIM in SAS performs discriminant analysis which classifies 
observations into various groups based on a set of descriptive variables. This 


classification is accomplished by generating a set of functions whose coefficients are 





chosen in a way such that the generalized squared distance between the variable 
values of an observation and the mean variable values of its assigned group is 
minimized [Ref. 9:p. 318]. The following discussion covers some of the theory behind 
discriminant analysis, and presents an example of the SAS DISCRIM procedure. 

The example uses the 1988 NRS data and variables HIQUAL, T079, and 
T082 (these variables are chosen for purposes of this example only, and their selection 
has no other significance). Variable HIQUAL, the dependent variable, groups each 
observation as either high quality or other (according to the criteria developed earlier 
in the thesis). The variables T079 and T082 are used as the discriminating variables. 
These two questions ask the respondent to rate the importance of money for college 
(T079) and money for vo-tech school (T082) to their decision to enlist. A rating of one 
indicates that the reason was of no importance to the enlistees decision. A rating of 
four indicates that the respondent would not have enlisted except for that reason, and 
ratings of two or three indicate intermediate degrees of importance of that reason. 

a. Generalized Squared Distance 

The equation used by SAS for the generalized squared distance between an 
observation and its group mean is given in Equation 1 [Ref. 9:p. 318]. 

This equation is similar to the Mahalanobis distance which is the 
generalized squared distance between the mean variable values for each group. 

b. SAS Output 

The SAS DISCRIM procedure produces a set of linear discriminant 
functions. One function for each group in the analysis is included in the output. As 


stated above, the functions are generated such that the generalized squared distance 
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D} (x) = (x-X,)7 COV" (x-X) 


where 


D? (x) = generalized least squared distance 
from x to group t 
X, = vector of means of variables for group t 


COV" = inverse of pooled within groups 
covariance matrix 





Equation 1 Generalized Least Square Distance 


between an observation and its group mean is minimized. Equation 2 shows the 


general form of the discriminant functions. 


= C, +4, %,)+-.+a,, xX 


discriminant function for group t 

constant term for group t : 
= coefficient for variable i group k 
= value of variable 





Equation 2 Discriminant Function 


(1) Generating coefficients. The discriminant coefficients are based 
on the pooled within groups covariance matrix of the discriminating (independent) 
variables and the mean values for the discriminating variables for each group. Let 


V=[v,] denote the covariance matrix as stated above then the matrix of coefficients 


A=[a,! is given by: A= v'x (Ref. 10:p. 97]. Provided V is non-singular. 








(2) Example. The DISCRIM procedure was used with variables 
HIQUAL, T079, and T082 as described above. The results of this procedure, listed in 


Equation 3, show the process of computing coefficients in this example. 


POOLED WITHIN GROUPS COVARIANCE MATRIX 


0.9716 /_ 
~ (0.4439 1.0493! 


INVERSE COVARIANCE MATRIX 


_, _ [1.2758 -0.5397 
~ |-0,5397 1.1813 


GROUP MEANS 


. [2.8665 2.4716 
~ 12.1035 2.265 


COEFFICIENT MATRIX 


i ~ [2.535 1.9308 
7 ~ 10.9319 1.342 


CONSTANTS 


m 
t=] 





Equation 3 Deriving Discriminant Functions 


From these results, the equations for the linear discriminant 


functions are shown in Equation 4. 
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-4.63 + 2.535(T079) + 0.9319(T082) 


= -3.90 + 1.9308(7079) + 1.342 (7082) 


Equation 4 Example Discriminant Functions 


These functions are used to classify observations in the respective 
groups by computing a score for each function based on the variable values for that 


observation and then classifying the observation to the group with the highest score. 


(3) Computing one discriminant function. In the case of a model with 
only two groups, the two discriminant functions listed above can be directly converted 
to one equation. This is done by simply subtracting the coefficients for the second 
group from the coefficients for the first group which yields the single function shown 


in Equation 5 [Ref: 11:p. 260]. 


= (a,,-@,,)(T079) + (a,, -@,,)(T082) 





0.6342 (7079) + -0.4046(7082) 


Equation 5 One Discriminant Function 


Note that the constant term is not included in this equation. 
Instead, a dividing point c is computed where c=c,-c, which results in a value of 0.8130 
for this example. Note also the reverse order of subtraction to compute the dividing 
point. This is required since the constant term in the two discriminant functions is 
-c,, not c; (see Equation 3). Now this single function can be used to classify the 
observations as well. A score for each observation is computed using the function and 


the variable values for that observation. If the score is greater than the dividing point 








c, then the observation is classified in group one if the score is less than the dividing 
point then it is classified in group two. The results are the same as the results 


obtained using two equations. [Ref. 11:p. 260] 


c. Interpretation of Coefficients 

The discriminant function coefficients indicate both the direction and 
degree of contribution each variable makes in classifying an observation. Consider the 
coefficients for the single discriminant function. A positive value for the coefficient 
indicates that observations with large values for the associated variable will tend to be 
classified in group one and visa versa. Further, these coefficients can be standardized 
by multiplying them by the pooled standard deviation for each variable. The 
magnitude of the standardized coefficient indicates the contribution of that variable 
to the discriminant function relative to the other coefficients. [Ref. 11:p. 257] 

In the example, given a coefficient of +0.6342 for the variable T079 
(which corresponds to money for college), a high score on this variable will contribute 
to that observation being classified as high quality. Or, in other words, a high quality 
individual will tend to be positively motivated to enlist in the Army given an incentive 
of earning money to attend college. On the other hand, the coefficient of -0.4046 for . 
variable T082 (which corresponds to money for vo-tech school) indicates that the 
incentive of earning money to attend vocational or technical school provides the exact 
opposite effect. These results seem roughly logical but may not reflect the actual 
motivations of recruits. This could be due to the few number of variables used and the 


intentionally unsophisticated nature of the example model. 
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d. Posterior Probabilities 

All of the previous discussions have considered only the discriminant 
function scores for a particular observation as a method of classifying the observation 
into a particular group. Another method of classification is by using the posterior 
probability of an observation belonging the assigned group [Ref. 11:p. 262]. The term 
posterior probability refers to the fact that the probability is computed after the 
analysis has been conducted. The posterior probability is the probability that an 
observation actually belongs to the group to which it was assigned during the 
discriminant analysis. This probability is also based on the generalized squared 
distance between the variable values of the observation and the mean variable values 
of the group to which it was assigned. Equation 6 lists the general formula for 


computing posterior probabilities [Ref. 11:p. 262]. 


D? (x) = generalized squared 
distance from x to group t 





Equation 6 Posterior Probabilities 


The posterior probabilities are particularly useful if one only wants to 
assign an observation to a group if it has a posterior probability above some threshold 
value. SAS uses the posterior probabilities to assign observations with the default 


threshold value of 0.5 (each observation assigned to the group with the greatest 
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posterior probability). The classification results using the default threshold value are 
the same as the previous two classification methods discussed. 
3. Logistic Regression 

Procedure LOGIST in SAS performs logistic regression to generate logistic 
function coefficients to classify observations into various groups based on a set of 
explanatory variables. The following discussion covers some of the theory behind 
logistic regression, how the logistic function coefficients are generated, and presents 
an example of the SAS LOGIST procedure. 

The example uses the same variables as used in the discriminant analysis 
example so that direct comparisons may be made (again there is no significance to the 
particular explanatory variables used, they are for demonstration only). The example 
uses the 1988 NRS data and variables HIQUAL, T079, and T082. These are the same 


variables that were used in the example of discriminant analysis explained above. 


a. SAS Output 
(1) Developing the Logit Function. In this project, as in many social 

science scenarios, we are interested in predicting the group membership of a particular 
observation. In the case of a dichotomous response variable we can define group 
membership as follows: 

Y=1 If the observation belongs to the first group 

Y=0 If the observation belongs to the other group 
Since the variable Y cannot assume continuous values, standard regression techniques 
are not appropriate. We can, however, use logistic regression to determine the 


probability that a particular observation belongs to a particular group based on the 
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values of the explanatory variables for that observation. The logistic equation used by 


SAS to predict the probability that Y=1 is shown in Equation 7 [Ref. 12:p. 270]. 


the vector of variable 


values for the i* observation 
vector of regression parameters 
the intercept parameter 





O< Ps 


Equation 7 Logistic Function 


Now we can also define the odds of belonging to group one as the 
probability of belonging to group one divided by the probability of not belonging to 


group one. This quantity is shown in Equation 8 [Ref. 11:p. 290]. 





Equation 8 Odds Function 


Note the asymmetric range of both the logistic function and the 
odds function. By taking the natural logarithm of the odds function we can eliminate 
this asymmetry. This is known as the Jogit function and is illustrated in Equation 9 


below. (Ref. 11:p. 290] 
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logit = In (odds) 


1-P 


a + x6 





-« < logit <@ 
Equation 9 Logit Function 
Note that the logit function is similar to the discriminant function 
in that the logit function is linear in the explanatory variables. The logit equation. 
however. has several attractive properties not found in the discriminant function that 
make it a good alternative for use in the analysis of categorical data. 
The fundamental assumption in logistic regression analysis is that In(odds) is 
linearly related to the independent variables. No assumptions are made 
regarding the distributions of the X variables. In fact, one of the major 
advantages of this method is that the X variables may be discrete or continuous 
(Ref. 11:p. 291]. 
Discriminant analysis could be used to estimate the logistic parameters in Equation 
9, but maximum likelihood estimates which depend only on the regression model 
should be used. Discriminant analysis requires multivariate normal explanatory 
variables while maximum likelihood estimates do not. In addition, logistic regression 
estimates are more robust than discriminant coefficient estimates. [Ref. 11:p. 291] 
(2) Logistic Function Parameter Estimates. From the previous 


discussion we can define the probability that observation i belongs to a particular 


group as P,. Then the relations in Equation 10 hold. [Ref. 13:p. 50] 
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P(Y,=1 | X,) 


P(Y,=0 | X,) 


P(Y, |X) = Pit (1- By" 





Equation 10 Probability of Y; given X; 
From the equations above, the probability of observing a particular 
sample of N values of Y given all N sets of X; observations is given by Equation 11. We 


define this as the likelihood function. [Ref. 13:p. 50] 


N 
L(Y|X,b)=P(Y|X) = J] Pa - PB) 


i=] 


where b is the vector of 
regression coefficients 





Equation 11 Likelihood Function 


Now the maximum likelihood estimate for the vector of 


coefficients b, say B, is given by LY|X) =" LAY XB). Since maximizing the natural 


logarithm of a function is equivalent to maximizing the function itself, we will take the 
natural logarithm of the likelihood function. Now we wish to maximize Equation 13 
over b to find our estimates 8. To accomplish this, we take the first derivative of 
Equation 13 with respect to each b in the coefficient vector and solve the resulting 


equation for zero. (Ref. 13:pp. 51-52] 


a2 











N 
In L(Y|X,b)= 7 [Y, In P,+(1-¥)) In (1-P))] 
i=} 


Equation 13 Log-Likelihood Function 


b. Example 
(1) General. As stated earlier, the SAS LOGIST procedure was used 
with variables HIQUAL, T079, and T082 to illustrate a simple example of logistic 
regression. Omitting intermediate steps, the log likelihood function is given by 
Equation 14 where the subscripts 1 and 2 refer to variables T079 and T082 
respectively. 


N 
In L(Y|X,b)= [Y, In P,+(1-¥) In (1-P))] 


i=l 


where 


1 
1+ 0b boa) 





Equation 14 Example Log-Likelihood Function 


Now we take the first derivative of the log likelihood function with 


respect each b;. set the resulting equations equal to zero and solve for the estimates 


B.. 


(2) Parameter Estimates. The parameter estimates generated in this 

example and the corresponding logit equation are shown in Equation 15. 
This equation can be used to classify observations in a manner 
similar to that used in discriminant analysis. We compute the log odds for each 


observation using the logit equation and the explanatory variable values for that 
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a = -0.493233 
B, = 0.648140 
B, 


= -0.431725 


-0.493233 + 0.648140x, - 0.431725 x, 





Equation 15 Example Logit Equation 
observation. Since the range of the logit equation is symmetric about the origin, we 
simply assign the observation to group one (high quality) if the resulting value is 


greater than zero; or to group two (other) if the resulting value is less than zero. 


c. Interpretation of Coefficients 

The logistic function coefficients can be interpreted in the same 
manner as the discriminant function coefficients. They provide an indication of both 
the direction and degree of contribution for each variable to the classification [Ref. 
1l:p. 257]. A positive value for the coefficient indicates that observations with large 
values for the associated variable will tend to he classified tg gran one ond visa versa. 

For the example given a coefficient of +0.6481 for the variable T079 
(which corresponds to money for college) means that a high score on this variable will 
contribute to that observation being classified as high quality. As observed in the 
discriminant model example, this indicates that a high quality individual will tend to 
be positively motivated to enlist in the Army based on the incentive of earning money 
to attend college. Again, as observed in the discriminant model example, the 
coefficient of -0.4317 for variable T082 (which corresponds to money for vo-tech school) 


indicates that the incentive of earning money to attend vocational or technical school 
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provides the exact opposite effect. These coefficients are also very similar in 
magnitude to those of the discriminant model example except for the dividing point. 
If we move the alpha term (-0.4932) in the logit equation to the left side of the 
equation (which of course changes the sign of the term yielding a value of +0.4932). 
this corresponds exactly to the discriminant model dividing point which had a value 
of +0.8130. Since individuals are assigned to the high quality group if the value of the 
assignment function used is greater than the dividing point, then in this example more 
individuals will be assigned to the high quality group when the discriminant mode] is 
used. This difference between the two models could be due to the assumptions 
required by the discriminant model. The accuracy of classification results for each 


model will be discussed later in the analysis portion of the study. 











IV. ANALYSIS 


A. OBJECTIVE 

As mentioned previously, our primary objective is to identify enlistment 
incentives that motivate high quality recruits to enlist in the Army. Further, we want 
to contrast the results of discriminant analysis and logistic regression in identifying 
these incentives. To accomplish these objectives, we first developed models using both 
discriminant analysis and logistic regression to classify recruits as either high quality 
or other based on a set of explanatory variables. Additionally. we analyzed the models 
to determine the relative importance of each explanatory variable in the classification 
of high quality recruits to identify factors providing the greatest enlistment incentives 


to this group. Lastly, a comparison of the results from the two models was done. 


B. EXPLANATORY VARIABLES 


1. Variables Initially Selected 

In choosing variables to use in the analysis. two primarv considerations were 
addressed. First. we believe that certain strictly demographic factors will affect one’s 
motivation to enlist in the Army. Of the NRS variables available in this category, we 
felt that race, marital] status, additional education since high school. and potential time 
in the job market were variables that would most influence enlistment motivation. 

Of the 1988 NRS respondents, 99% had no additional education since high 
school, so this variable was dropped from consideration. For race. less than 4% of the 


respondents were listed as Indian/Alaskan or Asian/Pacific so race was transformed 
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into a dichotomous variable where the possible responses were White/Other or Black. 


The Indian/Alaskan and Asian/Pacific categories were added to the White/Other 
response because of the relatively small increase this causes to the White/Other 
group. Additionally, only 0.1% of the enlistees responded, that they were divorced to 
the marital status question so these responses were added to the single category to 
produce another dichotomous variable having the possible responses of either married 
or single. 

The variable to measure potential time in the job market deserves special 
explanation. We believe that potential time in the job market can be approximated by 
the time between the date that the respondent graduated from high school and the 
date that he took the NRS survey. During this time the individual can be considered 
in the job market. Although we really have no knowledge of whether the person 
actually was working, or seeking work, we believe that this period may contribute to 
the individual’s enlistment motivations. Both the high school graduation date and the 
survey date are available in the NRS data base, so by simply subtracting one from the 
other, we arrive at the number of months that the person was potentially in the job 
market. This variable was divided into two levels where the first level is less than or 
equal to one year. and the second level is greater than one year in the potential job 
market. 

The second consideration is the stated reasons for enlisting in the Army as 
possible explanatory variables. Questions in the survey that will be most indicative of 
enlistment motivations are the twenty-two weighted response questions that 
specifically address reasons for enlisting. In these questions, the respondent is 


presented with a particular reason for enlisting in the Army and he must rate the 


to 
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importance of this reason toward his decision to enlist. An answer of "1" indicates that 
the reason was of no importance, "2" indicates fairly important, "3" indicates very 
important, and an answer of "4" indicates that the respondent would not have joined 
except for this reason. 

While we believe that these questions can be used as good indicators of 
motivation to enlist in the Army, we do not believe that these variables can be 
considered independent. In order to deal with the dependence between these variables 
and to also reduce the number of explanatory variables in our analysis, we used 
principal factor analysis to identify the relationships between these variables and to 
help develop new orthogonal variables to use in our analytical models. The results of 
this factor analysis are covered in the next section. 

2. Factor Analysis 
a. General 

As discussed in the previous section, we believe that the twenty-two 
weighted response variables in the NRS may be good predictors of enlistment 
motivators, but we also believe that they are correlated with each other. To limit this 
dependence among the variables, we used factor analysis to develop a new set of 
variables which have a minimum of correlation with each other. The basic idea behind 
factor analysis is that the original set of variables can be described by a smaller 
underlying set of factors. Factor analysis is a formal method of determining how many 
of these underlying factors exist and the weight that each of the original variables 
contributes to the individual factors [Ref. 10:p. 9]. In effect, the smaller set of 


underlying factors becomes a linear combination of the original variables. 
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b._ Questions With Greatest Loadings per Factor 
The factor analysis procedure identified four factors underlying the 
twenty-two weighted response variables. The factors can be subjectively named by 
observing which reasons are weighted most heavily in the rotated factor pattern. The 
four factors with their subjective names and most heavily weighted variables are listed 
below. See Appendix C for a complete table of factor loadings. 


(1) Factor 1 - Better myself 


e Importance of becoming a responsible person 0.72 
e Importance of becoming more self-reliant 0.69 
e Importance of becoming a better individual 0.66 
e Importance of a chance to better myself 0.51 
e Importance of money for college 0.24 


(2) Factor 2 - Serve my country/be a leader 


e Importance of wanting to be a soldier 0.67 
e Importance of serving my country 0.64 
e@ Importance of leadership training 0.49 
e Importance of physical training 0.46 
e Importance of proving I can make it 0.34 
e Importance of family tradition to serve 0.31 
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(3) Factor 3 - Money/benefits/job 


@ Importance of fringe benefits 

e Importance of retirement benefits 

e Importance of getting a better job 

e Importance of skill training 

e Importance of earning more money 

e Importance of money for vo-tech school 


e Importance of unemployment 


(4) Factor 4 - Get away from home/travel 


e Importance of being away from home 
e Importance of time to decide life plans 
e Importance of escaping personal problem 


e Importance of travel 


3. Final Variables Selected 





0.58 
0.53 
0.51 
0.43 
0.41 
0.29 


0.23 


0.43 
0.39 
0.36 


0.29 


Based on the subjective beliefs concerning demographic variables and the 


factor analysis mentioned above, the following variables were selected for inclusion in 


our models: 


e Race 
e Marital Status 
e Potential Experience in the Labor Force 


e Factor 1 (Better myself) 
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e Factor 2 (Serve my country/be a leader) 
e Factor 3 (Money/benefits/job) 


e Factor 4 (Get away from home/travel) 


C. RESULTS OF COMBINED MODELS 

The initial models developed with the variables listed above are termed 
“combined models" because both racial categories were included in the factor analysis 
and in the two models. Some of the results discussed below indicate that this may not 
be the best procedure to use and alternative methods, with results, are also presented. 


1. Discriminant Analysis Model 


a. Classification Equations 

Using the variables described above, the SAS procedure DISCRIM was 
used to conduct a discriminant analysis between the high quality and non-high quality 
survey respondents. The standard procedure output is one classification equation for 
each group. Observations are then assigned to the group on which they have the 
highest score based on these classification equations. The two equations are listed 
below. 

Zrignh= 9-24 Zother= 0-77 
0.81 *(Race) 2.58 « (Race) 


0.67 *(Marital Status) 1.11*(Marital Status) 
1.27*(Labor Force) 1.20*(Labor Force) 


-0.09*(Factor 1) -0.20*(Factor 1) 
~0.04 «(Factor 2) 0.20*(Factor 2) 
-0.18*(Factor 3) 0.11+*(Factor 3) 

0.09%(Factor 4) 0.07*(Factor 4) 





Equation 16 Discriminant Classification Equations 





b. Classification Results 





Based on the equations above, the classification results (using the same 
data as the coefficients were generated from) are shown in Table 3 below. 


TABLE 3 DISCRIMINANT MODEL CLASSIFICATION RESULTS 


ae 









2. Logistic Regression Model 
a. Classification Equation 
The SAS procedure LOGIST was used to perform logistic regression 
using the quality variable as the response variable and the dependent variables 
described above as the explanatory variables. The model and coefficients generated 


are shown in Equation 17 below. 


b._ Classification Results 
Based on Equation 17, the classification results are shown in Table 4 


below. 


32 











PlQuality=High] = - 1 


+e 7% BX 


where 


and 


-1.60+*(Race) 
-0.41*(Marital Status) 
0.08+(Labor Force) 
0.12*(Factor 1) 
-0.24*(Factor 2) 
-0.30*(Factor 3) 
0.02 +(Factor 4) 





Equation 17 Logistic Classification Equation 
TABLE 4 LOGISTIC MODEL CLASSIFICATION RESULTS 


a 


3. Comparison of The Two Methods 










a. Theoretical 
As mentioned earlier, the discriminant classification equation and the 
logit function are both linear in the explanatory variables. Additionally, except when 
model assumptions are violated, we would expect results from the two procedures to 
be quite similar. If the explanatory variables are multivariate normal (as assumed by 
the discriminant model), then the same level of precision as with logistic regression 
can be achieved even when a smaller sample size is used [Ref. 11:p. 291]. However, 


"the estimates of the coefficients or the probabilities derived from the two methods 
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will rarely be substantially different from each other, whether or not the multivariate 
normality assumption is satisfied” [Ref. 11:p. 291]. 
b._ Observed 

As expected the results indicate that the two methods are fairly similar 
in classifying respondents. Although, the discriminant procedure is better at 
classifying the other category of respondents and the logistic procedure is better at 
classifying the high quality respondents, these differences are fairly small. Further, 
the previous results only allow us to compare the two methods based on their relative 
classification results. We can, however arrive at a more direct comparison of the two 
classification methods with some simple manipulation of the respective classification 
equations. 

By subtracting the corresponding coefficients of the two classification 
equations from the discriminant analysis, we can generate a single classification 
equation. Further by subtracting the constant terms from these two equations, we 
find a “dividing point" for our equation. Now by evaluating an observation on this new 
equation, we can classify the individual depending on whether the resulting value of 
the equation is greater than or less than the dividing point. 

Similarly, if we use the "log odds" form of the logistic regression 
equation, we have an equation of the same form as the single discriminant equation 
above. In fact, the discriminant coefficients could have been used in the logistic model 
in the first place but using maximum likelihood estimates instead allows us to avoid 


the multivariate normal requirements of discriminant analysis. These new equations 


are shown in Equation 18 below. 












Discriminant Logistic 
classify high classify high 
if Z > -0.55 where if Z > -v0.84 where 














Z = -1.77*(Race) Z = -1.60*(Race) 

-0.43 «(Marital Status) -0.41 (Marital Status) 
0.08*(Labor Force) 0.08*(Labor Force) 
0.12*(Factor 1) 0.12*(Factor 1) 

-0.24*(Factor 2) -0.24%(Factor 2) 

-0.29»(Factor 3) -0.30+(Factor 3) 







0.02*(Factor 4) 0.02*(Factor 4) 





Equation 18 Comparison of Classification Equations 


These equations indicate that potential labor force experience, Factor 
1, and Factor 4 are important in determining if a respondent is classified as high 
quality. This gives some indication that high quality enlistees spent more potential 
time in the labor force prior to joining the Army. Additionally, high quality 
respondents were more interested in becoming better, more responsible people and 
having an opportunity to travel, as indicated by Factor 1 and Factor 4 respectively. 


Conversely, Marital Status, Factor 2, and Factor 3 all have negative coefficients 





indicating that these variables do not contribute to classifying individuals as high 
quality (note that the race variable has not been mentioned here, the section below 
will explain why). This indicates that if a high quality individual is married he may 
be less inclined to join the Army. Also, the negative coefficient associated with Factor 
2 indicates the high quality recruits were less likely to be motivated by a desire to 
serve when they enlisted in the Army. Similarly, the negative coefficient for Factor 
3 indicates that high quality recruits are less interested in incentives directly 
associated with monetary compensation, getting a job, or future benefits such as 


retirement. 
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Unfortunately, neither the labor force variable nor Factor 4 are 


significant at the 0.05 level (all other variables are significant at this level) based on 
the logistic regression model. Naturally, this leads to some skepticism regarding any 
conclusions drawn based on these variables despite the relatively accurate classification 
results. 
4. Problems 

While the preceding results appear to be encouraging, a cioser analysis 
indicates that both discriminant analysis and logistic regression are poor at classifying 
black respondents. Tabies 5 through 8 below indicate the classification results for each 
procedure by racial category. 


TABLE 5 DISCRIMINANT MODEL (WHITE ONLY) 


Actual 
eee 












TABLE 6 LOGISTIC MODEL (WHITE ONLY) 


Actual 
es 
1322 (99.10%) 
584 (98.82%) 














36 





TABLE 7 DISCRIMINANT MODEL (BLACK ONLY) 


| Actual | Classified As Group 
= 


TABLE 8 LOGISTIC MODEL (BLACK ONLY) 


= 
228 (99.56%) 
465 _ (99.57%) 






















Clearly, the "combined" models do not accurately mude! quality according 
to the two racial categories. We believe that this may be due to sociological dific. ences 
which influence incentives that may vary between the two racial groups. Since the 
sample population is mainly (73.4%) white, the sociological characteristics of black 
respondents could be misrepresented in the factor analysis procedure. To attempt to 
correct this deficiency, we replicated the previous work separately for each racial 


group. The results of these "separated" models are presented in the next section. 


D. RESULTS OF MODELS FOR BLACK GROUP ONLY 
1. Factor Analysis 
The factor analysis procedure for the black only racial group again identified 
four factors underlying the twenty-two weighted response variables. The first three 


factors are close to the first three factors in the combined factor analysis, however. the 
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fourth factor dcesn’t appear to follow a single distinct pattern. The factors can be 
subjectively nemed bv observing which reasons are weighted most heavily in the 
rotated factor pattern. The four factors with their subjective names and most heavily 
weighted variables are listed below (the subscript "B" is added to the factor number to 
indicate that the factors were derived from the black respondents only). See Appendix 


C for a complete table of factor loadings. 


a. Factor 1, - Better myself 


® Importance of becoming a responsible person 0.70 
e Importance of becoming more self-reliant 0.63 
e Importance of becoming a better individual 0.59 
e@ Importance of a chance to better myself 0.55 


b.__Factur 2, - Serve my country/be a leader 


e Importance of wanting to be a soldier 0.70 
e Importance of serving my country 0.67 
e@ Importance of leadership training 0.50 
e Importance of physical training 0.45 
e Importance of travel 0.23 


c. Factor 3p - Money/benefits/job 


e Importance of fringe benefits 0.57 
© Importance of retirement benefits 0.54 
e@ Importance of getting a better job 0.44 
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e Importance of money for vo-tech school 0.44 


e@ Impertance of skill training 0.44 
e Importance of earning more money 0.43 
e Importance of money for college 0.33 


d.__Factor 4, - Other 


e Importance of time to decide life plans 0.46 
e Importance of being away from home 0.43 
e Importance of escaping a personal problem 0.42 
e Importance of unemployment 0.39 
e Importance of family tradition to serve 0.35 
e Importance of proving I can make it 0.33 


2. Discriminant Analysis Model 


a. Classification Equations 
Using the variables described above, the SAS procedure DISCRIM was 
used to conduct a discriminant analysis between the high quality and non-high quality 
survey respondents. The standard procedure output is one classification equation for 
each group. Observations are then assigned to the group on which they have the 
highest score based on these classification equations. The two equations are listed 


below. 
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Zpgp=-0-17 Zorher™ 0-26 


high 
. 0.38 «(Marital Status) 0.73*(Marital Status) 
1.13«(Labor Force) 1.55*(Labor Force) 
-0.01 «(Factor 1,) -0.05*(Factor 1,) 
~0.28*(Factor 2,) 6.11*(Factor 2,) 
-0.08*(Factor 3,) -0.12*(Factor 3,) 
-0.11*(Factor 4,) 0.13*(Factor 4,) 


Equation 19 Discriminant Model (Black Only) 


b. Classification Results 
Based on the equations above, the classification results are in Table 9 below. 


TABLE 9 CLASSIFICATION RESULTS (BLACK ONLY) 


4 (39.44%) 









3. Logistic Regression Model 
a. Classification Equation 
The SAS procedure LOGIST was used to perform logistic regression 
using the quality variable as the response variable and the dependent variables 
described above as the explanatory variables. The model and coefficients generated 
are shown in Equation 20 below. 
b. Classification Results 


The classification results are in Table 10 below. 
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1 
1+e7#-6X 


P[Quality=High] = 


where 
a=-0.67 
and 


-0.41*(Marital Status) 
-0.43*(Labor Force) 
0.05 «(Factor 1,) 
-0.40+*(Factor 2,) 
0.04+(Factor 3,) 
-0.27*(Factor 4,) 





Equation 20 Logistic Model (Black Only) 
TABLE 10 CLASSIFICATION RESULTS (BLACK ONLY) 


Actual Classified As Group 
J 
3 (01.00%) 210 (99.00%) 


or cones 


4. Comparison of The Two Methods 






The previous results only allow us to compare the two methods based on 
their relative classification results. We can, however arrive at a more direct 
comparison of the two classification methods with some simple manipulation of the 
respective classification equations. 

The procedures to generate these equations were described earlier and are 


not repeated here. The new equations are shown in Equation 21 below. 
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Discriminant Logistic 
classify high classify high 
if Z > -0.098 where if Z > 0.666 where 


-0.34«(Marital Status) Z = -0.41*(Marital Status) 
-0.42«(Labor Force) -0.43 «(Labor Force) 
0.04*(Factor 1,) 0.05+(Factor 1,) 


-0.39*(Factor 2,) -0.40+(Factor 2,) 
0.04«(Factor 3,) 0.04 + (Factor 3,) 
-0.24*(Factor 4,) ~0.27+*(Factor 4,) 





Equation 21 Comparison of Classification Equations (Black Only) 


These equations indicate that the reason that the logistic equation is so 
poor at correctly classifying high quality respondents is because of the unusually high 
intercept term. Later, we will present a technique to compensate for this fact and 


improve the classification results for the logistic model. 


E. RESULTS OF MODELS FOR WHITE GROUP ONLY 
1. Factor Analysis 

The factor analysis procedure for the white only racial group again 
identified four factors underlying the twenty-two weighted response variables. All four 
factors are close to the factors identified in the combined factor analysis. The factors 
can be subjectively named by observing which reasons are weighted most heavily in 
the rotated factor pattern. The four factors with their subjective names and most 
heavily weighted variables are listed below (the subscript "W" is added to the factor 


number to indicate that the factors were derived from the white respondents only). 


See Appendix C for a complete table of factor loadings. 





a. Factor ly - Better myself 


Importance of becoming a responsible person 
Importance of becoming a better individual 
Importance of becoming more self-reliant 


Importance of a chance to better myself 


Importance of leadership training 


Importance of physical training 


b. Factor 2w - Serve my country/be a soldier 


Importance of wanting to be a soldi 


Importance of serving my country 


er 


Importance of proving I can make it 


Importance of family tradition to serve 


c. Factor 3, - Benefits/job 


Importance of fringe benefits 
Importance of getting a better job 
Importance of retirement benefits 
Importance of skill training 
Importance of earning more money 


Importance of unemployment 
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0.77 


0.75 


0.72 


0.54 


0.48 


0.43 


0.61 


0.57 


0.36 


0.31 


0.56 


0.53 


0.49 


0.45 


0.42 


0.30 








d. Factor 4w - Travel/education 


@ Importance of money for college 0.44 
e Importance of money for vo/tech school 0.39 
e Importance of time to decide life plans 0.39 . 
e Importance of being away from home 0.38 
e Importance of travel 0.34 
e Importance of escaping a personal problem 0.24 


2. Discriminant Analysis Model 


a. Classification Equations 
As with the Black only model, the SAS procedure DISCRIM was used 
to conduct a discriminant analysis between the high quality and other survey 
respondents. Again, observations are assigned to the group on which they have the 
highest score based on these classification equations. The two equations are listed in . 


Equation 22 below. 


Zrigh= ~0-18 Zother= 0-19 
0.66*(Marital Status) 0.66*(Marital Status) 
1.29«(Labor Force) 1.21*(Labor Force) 
-0.06»(Factor 1,) 0.01+(Factor 1,) 


-0.04*(Factor 2,) 0.18*(Factor 2,) 
-1.19*(Factor 3,) 0.22+*(Factor 3,) 
0.16«(Factor 4,) ~0.10*(Factor 4,) 





Equation 22 Discriminant Model (White Only) 
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b. Classification Results 
Based on the equations above, the classification results are in Table 11 
below. 


TABLE 11 CLASSIFICATION RESULTS (WHITE ONLY) 


Actual Classified As Group 
pai Other 
ee sae 


ae ae Wetaee 









3. Logistic Regression Model 


a. Classification Equation 
As before, the SAS procedure LOGIST was used to perform logistic 


regression. The model and coefficients generated are shown in Equation 23 below. 


P [Quality =High] = 
1+e"%- 6X 


vhere 


and 


0.01 *(Marital Status) 

0.08+(Labor Force) 
-0.07+(Factor 1,) 
-0.21+*(Factor 2,) 
-0.40+(Factor 3,) 

0.26 (Factor 4,) 





Equation 23 Logistic Model (White Only) 








b. Classification Results 


Based on the equation above, the classification results are in Table 12 
below. 


TABLE 12 CLASSIFICATION RESULTS (WHITE ONLY) 


peli 











4. Comparison Of The Two Methods 
As discussed in the Black only analysis section, we make some simple 

manipulations of the above classification equations to arrive at a more direct 
comparison of the two classification methods. The results of this process are shown 
in Equation 24 below. 

Discriminant Logistic 

classify high classify high 

if Z > -0.011 where if Z > -0.802 where 


~0.00*(Marital Status) Z = 0.01*(Marital Status) 


0.08 «(Labor Force) 0.08»(Labor Force) 
-0.07*(Factor 1,) -0.07*(Factor 1,) 


-0.21*(Factor 2,) -0.21*(Factor 2,) 
-0.40*(Factor 3,) -0.40*(Factor 3,) 
0.26*(Factor 4,) 0.26*(Factor 4,) 








Equation 24 Comparison of Classification Equations (White Only) 


Just as in the Black only analysis, we observe that logistic regression poorly 


classifies high quality respondents because of the unusually high intercept term. The 


46 


a 





next section presents a technique to compensate for this fact and improve the 
classification results for the logistic model. 

In contrast to the Black only equations, the coefficients associated with the 
Marital Status, Labor Force, Factor 1, Factor 3, and Factor 4 variables are of opposite 
sign in the White only equations. This indicates that these variables have exactly the 
opposite effect on high quality individuals based on their race. Recall, however, that 
the respective factor variables are not identical for each racial category and as such 
cannot be directly compared. These results and their interpretation for each racial 


category will be discussed further in the conclusion section of the thesis. 


F. ADJUSTED LOGISTIC MODEL 
1. General 
The results of the previous section show that modeling the data scparately 
by each race improves the classification results for the discriminant models but not for 
the logistic models. 
Recall that the logistic model is merely a probability of group membership. 
Each observation is assigned as high quality if it has greater that a 0.5 probability of 
being in that group based on the explanatory variables; otherwise, the observation is 
assigned to the other group. We may however, specify a different threshold probability 
in order to attempt to correct the poor results of the logistic model. 
2. Adjusted Assignment Probability 
As discussed earlier, we can influence the classification procedure in the 
logistic model by adjusting the probability threshold level for group assignment. For 


the Black race category, the default threshold of 0.5 was shown to assign 98% of the 
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respondents to the other group when only 68% of the respondents were actually in 
this group. For the White race category, 98% of the respondents were assigned to the 
high quality group when only about 69% of the respondents were actually in the high 
quality group. 

These classification results indicate that the threshold probability for the 
Black race category is too high and that the threshold probability for the White race 
category is too low. Assuming that we would desire the classification results to be 
similar to those for the discriminant models, we can experiment with different 
threshold probabilities to accomplish this goal. 

For the Black race category, a threshold probability of 0.325 results in the 
classification results shown in Table 13. For the White race category, a threshold 
probability of 0.685 results in the classification results in Table 14. 


TABLE 13 CLASSIFICATION RESULTS (BLACK ONLY, p=0.325) 


Actual 
oe 
89_ (41.78%) 













TABLE 14 CLASSIFICATION RESULTS (WHITE ONLY, p=0.685) 


Actual 
bid 
259 (43.24%) 340 (56.76%) 








48 











These results are much closer to the results found in the discriminant 


models for the respective race categories and provide much more balanced correct 


classifications between the high quality and other groups. 


G. CLASSIFICATION RESULTS USING DIFFERENT DATA 

As mentioned before, all classification results reported earlier in the thesis are 
computed by experimental classification of the data that was used to generate the 
model coefficients. This data represents only 80% of the entire sample of respondents. 
The other 20% of the sample data points were withheld in order to provide another 
sample to check the models. The classifications listed in the main analysis of the 
thesis were repeated using this smaller data set, and the results were quite similar to 
those using the larger data set. Ail small sample classification tables are listed in 


Appendix D. 
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V. CONCLUSIONS 

Our objective has been to identify those enlistment incentives that have the 
greatest impact on enlistees in the prime recruiting market. We hypothesized that 
the incentives which motivate prime market recruits to join the Army are different for 
high quality and non-high quality individuals. Further, we wanted to compare the 
res Its of discriminant analysis and logistic regression in conducting the categorical 
data analysis to identify these enlistment incentives. We have been able to identify 
enlistment incentives as desired, however, our results indicate models based on either 
technique should be developed separately for each racial group under consideration. 
Further, our analysis indicates that tiiere may be certain conditions that cause the use 


of one model over the other to be preferable. 


A. COMBINED MODELS 

Due to the poor classification results for Black respondents observed in the 
"combined" models discussed in the previous chapter, these models are considered to 
be of limited value in correctly identifying enlistment incentives. However, the results 
of the "combined" models do provide some indication that the discriminant analysis 


and logistic regression models provide comparable classification results. 


B. SEPARATED MODELS 
Since the classification results for each racial category were so poor, we believe 
that models, separated by race, are required to accurately identify incentives important 


to all sample respondents. 
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1. Factor Analysis 

As expected, conducting factor analysis separately for each racial category 
identified different factors for blacks and whites. Although these differences are not 
dramatic, they confirm the belief that separate models for each race are required. 

2. Model Effectiveness 

The "separated" discriminant analysis models are fairly successful in 
predicting quality group membership for both the Black and the White racial groups. 
Therefore, these models can be effectively used to identify incentives for the high 
quality respondents. 

The "separated" logistic regression models are highly inaccurate in group 
classification at the 0.5 threshold probability and must be modified in order to obtain 
acceptable classification results. By adjusting the threshold classification probabilities 
more accurate classification results can be achieved. 


3. Important Enlistment Incentives 


a. Black Racial Category 
For respondents in the Black racial category, both models identified 
Factor 1, and Factor 3, as explanatory variables contributing to high category 
classification. This first factor indicates that the black, high quality enlistees are 
concerned with becoming more responsible, more self-reliant people. Additionally, the 
second factor, indicates that the black, high quality enlistees are concerned with 
earning money and receiving benefits in the Army. The second factor also includes 


such concerns as receiving skill training directly and in receiving money to use for 


education at vo-tech schools or college. According to the "separated" models these 





incentives were most influential in attracting the high quality black enlistees surveyed 
to enlist in the Army, and these incentives may be effective in attracting future 


enlistees to join the Army. 


b. White Racial Category 

For respondents in the White racial category, both models identified 
potential labor force experience and Factor 4w as explanatory variables contributing 
to high category classification. While the potential labor force experience variable does 
not specifically address enlistment incentives, this indicates that for this sample of 
recruits, white respondents in the high quality group tended to have more time 
between high school and enlisting in the Army. This could indicate that the high 
quality white respondents first tried to work or further their education after high 
school and decided to join the Army to get help with these ambitions. This theory is 
somewhat reinforced by the second variable which contributes to high quality 
classification for white respondents. Factor 4y indicates that the high quality white 
respondents surveyed joined the Army to get money for college or vo-tech school and 
to travel or get away from home to decide their future life plans. All of these reasons 
from Factor 4y, could be attributed to a person who tried other plans following high 


school and later considered to Army as a means to accomplish these previous goals. 


C. DISCRIMINANT vs LOGIST MODEL 

Based on the results presented in the analysis chapter of this thesis, it seems 
that the discriminant analysis model is less sensitive to unbalanced group membership 
of the data. This indicates that if the empirical distribution of the data is unknown 


or if it is believed to be skewed toward one particular group, then the discriminant 
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analysis model would be preferable. However, if this is not the case, logistic regression 
may be preferable due to the assumptions required by the discriminant analysis model. 
Further, the logistic regression model provides significance levels for model 
coefficients which are not computed during discriminant analysis. Ideally, both models 
should be used and the results compared as in this thesis to most accurately explore 


and model the data under observation. 
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APPENDIX A SELECTED FREQUENCY TABLES 


TABLE 15 SEX REPORTED ON MEPRS/REQUEST 


Frequency Percent Cumulative 
Frequency 


| NeMatch | 72 | | . 
Male 5233 tes | ease fone | 
| Female | s58__| 











Cumulative 
Percent 









TABLE 16 MARITAL STATUS 


Marital Frequency umulative umulative 
Status Tae Bene 
Missing | 5 | | 


(ESE EE ae ee ee ee eee 















Divorced 


Annulled Ser ae ewer 5786 
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TABLE 17 EDUCATION CERTIFICATION 


Percent Cumulative | Cumulative 
Level Frequency Percent 





TABLE 18 TERM OF ENLISTMENT 


Enlistment Frequency Percent Cumulative | Cumulative 
Term Frequency Percent 


er ee eee Deere eee 











j2¥eor | vst | ws | es |_| 
}s¥eor | one | as | gas | 2.9 
5565 98.2 

217 

4 











84 
eer | 7 | sa | ome | on 
rr ee 100.0 


wal 
Sr 











TABLE 19 AGE AT TIME OF ACCESSION 


Age at Frequency Cumulative | Cumulative 
Accession Frequency Percent 


NoMatch | 72 | | 
2121 2648 45.1 









on 
to 
ary 























TABLE 20 CASH BONUS 


Cash Frequency 
Bonus 


Not 5249 
Received 


2 





Cumulative | Cumulative 
Frequency Percent 










TABLE 21 ACF ELIGIBILITY 


Army Frequency Percent Cumulative 

College Frequency 

Fund 
Not Eligible 





TABLE 22 SELF-REPORTED RACIAL GROUP 


Race Frequency Percent Cumulative | Cumulative 
Frequency Percent 


Pacific 


Black 





TABLE 23 MENTAL TEST CATEGORY 


Mental Frequency Percent Cumulative | Cumulative 
Category Frequency Percent 
ae ae ere eee ee 













(Ref. 


In 





APPENDIX B 1988 NRS INCENTIVE QUESTIONS 


The following questions are reprinted from the 1988/89 USAREC Survey Form 
14:pp. J. 


the next series of questions, use the following scale to rate HOW 


IMPORTANT each of the reasons listed below was in your decision to ENLIST. 


33. 


34. 


1 - Not at all Important 

2 - Somewhat Important 

3 - Very Important 

4 - I would not have enlisted except for this reason 


I enlisted because I was unemployed and couldn’t find a job. 


I enlisted to give myself a chance to be away from home on my own. 


. I enlisted because the military will give me a chance to better myself in life. 
. I enlisted because I want to travel and live in different places. 

. I enlisted to get away from a personal problem. 

. Lenlisted because I want to serve my country. 

. I enlisted because I can earn more money than as a civilian. 

. T enlisted because it is a family tradition to serve. 


. I enlisted to prove that I can make it. 


I enlisted to get trained in a skill that will help me get a civilian job when I get 
out. 


I enlisted so I can get money for a college education. 


. I enlisted because I want to be a soldier. 


. I enlisted so I can get money for civilian vocational, technical, or business 


school education. 


I enlisted for the physical training and challenge. 
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47. I enlisted to take time out before deciding what I really want to do. 
48. I enlisted because men and women are treated as equals in the military. 


49. I enlisted because the military experience is beneficial to both men and women 
soldiers. 


50. I enlisted because I want leadership training. 
51. I enlisted because I like the retirement benefits. 


52. I enlisted because I want the fringe benefits (e.g., health /dental care, low prices 
in military stores). 


53. I enlisted to become a better person. 

54. I enlisted to work with sophisticated, high-tech equipment. 

55. I enlisted to become self-reliant. 

56. I enlisted to learn to be a responsible mature person. 

57. I enlisted to obtain a better job than the one I had. 

58. Below are some reasons that people join the military. The next two questions 
contain very similar sets of reasons. They differ only in a few of the responses. 
Please be careful in answering: try to answer each question without comparing 


it to the other one. 


A. Which of these reasons is your MOST IMPORTANT REASON for enlisting? 
(Mark only one) 


e@ I was unemployed. 

e To be away from home on my own. 

e I want to travel. 

e To get away from a personal problem. 
e To serve my country. 

e Earn more money. 


e Family tradition to serve. 


e To prove that I can make it. 





To get trained in a skill. 


Money for a college education. 


Which of these reasons is your MOST IMPORTANT REASON for enlisting? 


(Mark only one) 
I was unemployed. 
To be away from home on my own. 


Chance to better myself. 





To get away from a personal problem. 


To serve my country. 

Earn more money. 

Family tradition to serve. 
To prove that I can make it. 
To get trained in a skill. 


Money for a college education. 
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APPENDIX C FACTOR LOADINGS 


TABLE 24 FACTOR LOADINGS ALL RACIAL GROUPS , 


DPorsonal Prohlom 





TABLE 25 FACTOR LOADINGS BLACK GROUP ONLY 


Reason for Enlisting 


Responsible Person 
Self-Reliant 

Better Individual 
Better Myself 

Be a Soldier 

Serve My Country 
Leadership Training 
Physical Training 
Travel 

Fringe Benefits 
Retirement Benefits 
Get a Better Job 
Vo/Tech Money 
Skill Training 

Earn More Money 
College Money 
Decide Life Plans 
Be Away from Home 
Personal Problem 
Unemployment 
Family Tradition 
Prove I Can Make It 


| oso | os4 | 027 | _ 002 | 
| 055 | 025 | 019 | -0.09 | 
| 019 | 070 | 003 | 008 | 
067 | 0.04 | _-0.06 | 
| 029 | oso] 02 | aus | 
| 028 | 045 | ott | 028 | 
wail ton 
| 002 | ose | 054 | 0.07 | 

o3 | 044 | 0.08 | 
| 006 | 009 | 044 | 0.08 | 
| o2r | oor | 044 | 0.00 | 
son ora coe 
| -002 | 001 | oor | oe | 
| 002 | -0.00 | 0.06 | 09 | 
| 010 | 028 | 007 | _0.3s | 


0 
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TABLE 26 FACTOR LOADINGS WHITE GROUP ONLY 
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APPENDIX D CLASSIFICATION RESULTS (20% WITHHELD DATA) 


TABLE 27 DISCRIMINANT MODEL (COMBINED) 


Actual 
aii 











TABLE 28 LOGISTIC MODEL (COMBINED) 


Actual Classified As Group 
ue High 


er yey Gainer) 
166 (55.52%) 133 (44.48%) 














TABLE 29 DISCRIMINANT MODEL (BLACK ONLY) 
Classified As Group 


wane Other 
25 (43.10%) 
67_(50.38%) 












TABLE 30 LOGISTIC MODEL (BLACK ONLY) 


ae 








TABLE 31 DISCRIMINANT MODEL (WHITE ONLY) 









TABLE 32 LOGISTIC MODEL (WHITE ONLY) 


Classified As Group 
5 (03.12% 
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TABLE 33 LOGISTIC MODEL (BLACK ONLY, p=0.325) 


= 








TABLE 34 LOGISTIC MODEL (WHITE ONLY, p=0.685) 


Classified As Group 


High 230 (61.83%) 142 (38.17%) 


a Ghee) solesaS 
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